43 research outputs found
Fast DD-classification of functional data
A fast nonparametric procedure for classifying functional data is introduced.
It consists of a two-step transformation of the original data plus a classifier
operating on a low-dimensional hypercube. The functional data are first mapped
into a finite-dimensional location-slope space and then transformed by a
multivariate depth function into the -plot, which is a subset of the unit
hypercube. This transformation yields a new notion of depth for functional
data. Three alternative depth functions are employed for this, as well as two
rules for the final classification on . The resulting classifier has
to be cross-validated over a small range of parameters only, which is
restricted by a Vapnik-Cervonenkis bound. The entire methodology does not
involve smoothing techniques, is completely nonparametric and allows to achieve
Bayes optimality under standard distributional settings. It is robust,
efficiently computable, and has been implemented in an R environment.
Applicability of the new approach is demonstrated by simulations as well as a
benchmark study
Fast computation of Tukey trimmed regions and median in dimension
Given data in , a Tukey -trimmed region is the set of
all points that have at least Tukey depth w.r.t. the data. As they are
visual, affine equivariant and robust, Tukey regions are useful tools in
nonparametric multivariate analysis. While these regions are easily defined and
interpreted, their practical use in applications has been impeded so far by the
lack of efficient computational procedures in dimension . We construct
two novel algorithms to compute a Tukey -trimmed region, a na\"{i}ve
one and a more sophisticated one that is much faster than known algorithms.
Further, a strict bound on the number of facets of a Tukey region is derived.
In a large simulation study the novel fast algorithm is compared with the
na\"{i}ve one, which is slower and by construction exact, yielding in every
case the same correct results. Finally, the approach is extended to an
algorithm that calculates the innermost Tukey region and its barycenter, the
Tukey median
Depth and Depth-Based Classification with R Package ddalpha
Following the seminal idea of Tukey (1975), data depth is a function that measures how close an arbitrary point of the space is located to an implicitly defined center of a data cloud. Having undergone theoretical and computational developments, it is now employed in numerous applications with classification being the most popular one. The R package ddalpha is a software directed to fuse experience of the applicant with recent achievements in the area of data depth and depth-based classification. ddalpha provides an implementation for exact and approximate computation of most reasonable and widely applied notions of data depth. These can be further used in the depth-based multivariate and functional classifiers implemented in the package, where the DDα-procedure is in the main focus. The package is expandable with user-defined custom depth methods and separators. The implemented functions for depth visualization and the built-in benchmark procedures may also serve to provide insights into the geometry of the data and the quality of pattern recognition
Tailoring Mixup to Data using Kernel Warping functions
Data augmentation is an essential building block for learning efficient deep
learning models. Among all augmentation techniques proposed so far, linear
interpolation of training data points, also called mixup, has found to be
effective for a large panel of applications. While the majority of works have
focused on selecting the right points to mix, or applying complex non-linear
interpolation, we are interested in mixing similar points more frequently and
strongly than less similar ones. To this end, we propose to dynamically change
the underlying distribution of interpolation coefficients through warping
functions, depending on the similarity between data points to combine. We
define an efficient and flexible framework to do so without losing in
diversity. We provide extensive experiments for classification and regression
tasks, showing that our proposed method improves both performance and
calibration of models. Code available in
https://github.com/ENSTA-U2IS/torch-uncertaint
Choosing among notions of multivariate depth statistics
Classical multivariate statistics measures the outlyingness of a point by its
Mahalanobis distance from the mean, which is based on the mean and the
covariance matrix of the data. A multivariate depth function is a function
which, given a point and a distribution in d-space, measures centrality by a
number between 0 and 1, while satisfying certain postulates regarding
invariance, monotonicity, convexity and continuity. Accordingly, numerous
notions of multivariate depth have been proposed in the literature, some of
which are also robust against extremely outlying data. The departure from
classical Mahalanobis distance does not come without cost. There is a trade-off
between invariance, robustness and computational feasibility. In the last few
years, efficient exact algorithms as well as approximate ones have been
constructed and made available in R-packages. Consequently, in practical
applications the choice of a depth statistic is no more restricted to one or
two notions due to computational limits; rather often more notions are
feasible, among which the researcher has to decide. The article debates
theoretical and practical aspects of this choice, including invariance and
uniqueness, robustness and computational feasibility. Complexity and speed of
exact algorithms are compared. The accuracy of approximate approaches like the
random Tukey depth is discussed as well as the application to large and
high-dimensional data. Extensions to local and functional depths and
connections to regression depth are shortly addressed